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In this report the combination of an iterative technique, the conjugate gradient algorithm, with a fast direct 
method, cyclic reduction, is used to solve the linear algebraic equations resulting from discretization of a 
nonseparable elliptic partial differential equation. An expository discussion of the conjugate gradient and 
preconditioned conjugate gradient algorithms and of their use in the solution of partial differential equations is 
presented. New results extending the use of the preconditioned conjugate gradients technique to singular linear 
equations which arise from discretized elliptic equations with Neumann boundary conditions are also given. The 
algorithms are applied to solve a specific elliptic equation which arises in the study of buoyant convection pro- 
duced by a room fire. A code was developed to implement the algorithms for this application. Numerical results 
obtained through testing and use of the code are discussed. 

Key Words: Conjugate gradient algorithm; elliptic partial differential equations; iterative methods for linear 
algebraic equations; Neumann boundary conditions; sparse matrices. 


The numerical solution of the linear algebraic equations which result from a discretization of an elliptic 
partial differential equation has been, and continues to be, the focus of much research in numerical 
analysis. Over the past 25 years there have been many advances, some toward improved iterative schemes 
(ADI, SOR, etc.), others toward fast direct methods (most notably those based on FFT’s). See Rice [1] ‘ for a 
brief survey of the impact of these advances. In this paper we discuss the combination of an iterative tech- 
nique, the conjugate gradient algorithm, with a fast direct method, cyclic reduction, to extend the capabil- 
ities of the fast solver. For examples of the use of similar combinations of algorithms, see Concus & Golub 
[2], Concus, Golub and O’ Leary [3], O’ Leary [4] and O’ Leary and Widlund [5]. 

The work described in this paper resulted from a study of buoyant convection carried out at the National 
Bureau of Standards, (6, 7, 8]. The specific elliptic partial differential equation for pressure arising in this 
work is used as a model problem in the present paper. The experience of the first author with the buoyant 
convection model motivated the writing of sections 1 and 2, which gives an exposition of the conjugate gra- 
dient and preconditioned conjugate gradient algorithms. These sections contain no new material; they were 
written so that this paper may be accessible to an audience unfamiliar with the development of the conju- 
gate gradient algorithm and its use in the solution of partial differential equations. Section 3 has a discus- 
sion of the model problem, the pressure equation. Section 4 contains several new results extending the use 
of preconditioned conjugate gradient technology to the singular linear equations which result from 
Neumann boundary value problems. We conclude with some numerical examples from the buoyant convec- 
tion problem. A listing of a FORTRAN program implementing the algorithm discussed here is presented in 
{15}. 


*Center for Applied Mathematics, National Engineering Laboratory; and Mathematical Sciences Department, Johns Hopkins University. Current Address: Boeing Computer 
Services, Co., Mail Stop 9C-01, P.O. Box 24346, Seattle, Washington 98124 

+Center for Applied Mathematics, National Engineering Laboratory. 
* Figures in brackets indicate literature references at the end of this paper. 
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1. An introduction to the conjugate gradient algorithm 


The discretization of an elliptic differential operator by standard finite difference or finite element tech- 
niques produces, as the finite analog of the continuous operator, very large matrices with most elements 
zero. From an algebraic standpoint we can very often view the numerical solution of the differential equa- 
tion as: 


Solve Ax = b (1) 


where A is ak by k, (k large) real symmetric, positive definite and sparse matrix. 

Fast direct methods with minimal storage requirements exist when A is the finite difference operator 
resulting from a separable elliptic operator on a rectangular region, and for some other common, but spe- 
cific, cases. If the operator is nonseparable, or the region non-rectangular, the special direct methods may 
not apply. Moreover, the use of general direct methods such as factoring A into a Cholesky decomposition, 


A=LL', 


often impose unbearable storage requirements because many zero entries are filled in during the 
decomposition. 

Iterative methods, such as SOR and Gauss-Seidel, solve a transformation of the problem. Instead of solv- 
ing (1) directly, they ‘split’? 4 as A = M-N, and solve the equivalent problem through an iteration of the 
form Mx**' = Nx + b; i.e., find a fixed point of the equation 


Mx = Nx+b. (2) 


The efficiency of the solution method depends in part on the appropriateness of the splitting and in part on 
acceleration of the iteration toward the fixed point 

The conjugate gradient algorithm can be motivated as the solution of another transformation of problem 
(1). Consider the inner product <x,y> = x” Ay. This induces a norm ||x||, = \/ x7Ax on R* when A is positive 
definite. 
Problem (1) is equivalent to; find x to minimize 


E(x) = (x—x*)’ A(x—x*) = |x—x*| 4 (3) 


where x* is the solution of (1). The problems are equivalent because E(x) 2 0 for all x, and E(x) = 0<=>x 
= x*. The immediate usefulness of the transformation is not obvious, especially since computing E(x) 
appears to require the solution, x*, of the problem we wish to solve. Note however, that we can evaluate 
whether a given x is or is not a solution by computing the residual r = b — Ax. Further, while we cannot 
evaluate E(x) directly, we can evaluate its gradient vector. 


VE(x) = 2(Ax — Ax*) = 2(Ax — b) = —2r. 


Hence, r gives us the direction in which E decreases most rapidly (unless r is identically zero, which charac- 
terizes the solution). 
These two characteristics of problem (3) lead directly to a simple algorithm for solving (3): 


STEEPEST DESCENT: Given a guess x,, not a solution, 


set i= b — Ax; 
set Xj, = X; +4,7; 


where a; is a scalar chosen to make x;,, minimize E (x,,,) for all vectors of the form x; + fr,. 


We can easily compute the gradient ;;. It is also easy to compute a; so that E (x;,,) = E(x; + air) < E(x; + 
Br,) for all B. The mimimum of E (x; + Br;) is given by the solution of 
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But 


fp Ee +r) = 2(Br74r, —1,7r), 


so 


2 
a 2 
r Ar, “ |r|] 2 (4) 


is the optimal choice for a,. Hence, we can carry out the iteration to minimize E (x) without computing E. 
Note also that A enters the iteration only in forming the products Ax, and Ar,. 

Unfortunately steepest descent is not a practical algorithm because the convergence can be very slow. We 
can obtain more rapid convergence by using not the steepest descent directions {r,}, but instead a sequence 
of downhill or descent directions {p,} which satisfy additional properties. We characterize the term ‘‘de- 
scent direction’ and simultaneously find the optimal distance to move in that direction, through the follow- 
ing simple result. 


LEMMA: /f p’r #0, then 


T 


B= ry minimizes E(x + Bp) for all B. (5) 


PROOF: Simply differentiate E (x + Bp) with respect to p: 


gp Ete + Bo) = 2(p"A(x —x") + Bp"Ap) 


= 2(—p’r + Bp" Ap). 


It is a simple computation to show that E (x + Bp) < E(x). Note that f is positive if p’r > 0, which charac- 
terizes p as a descent, not an ascent, direction. 


The first observation which leads to improved convergence is that the choice in (5) for 8, not only solves 
the one-dimensional minimization, it gives us the p— component of x* exactly. To see this, note that A posi- 
tive definite implies that we can uniquely write 


x+Bp = apt+dw 
(6) 
x*=a*pt+d*%z 


where w’Ap = 0 = 27Ap. This last condition is read: w, z are A-orthogonal or A-conjugate to p. Now substi- 
tute the decompositions (6) into E: 


E(x +Bp) = (a —a*)*p"Ap + terms not involving a, a* orp (7) 
The choice (5) for 8 minimizes the left side of (7), and clearly a = a* minimizes the right hand side of (7). 
Hence choosing =__P’" implies that 
p’Ap 
(x +Bp)—x* = dw—d*z. (8) 
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The error vector is A-conjugate to p, i.e., lies in the k — 1 dimensional subspace of vectors ¢ such that t7Ap = 
0. If all succeeding descent directions are chosen from this subspace, we can preserve the exact solution of 
the problem in the direction p. 

The second observation is that we can easily choose successive directions p,, p2 p3, . . . such that the {p;} 
are all pairwise A-conjugate descent directions (hence are ‘‘A-conjugate gradients’’) and hence, such that 
the successive errors x, — x*, x, — x*, x; — x* are constrained to successively smaller dimensional sub- 
spaces. The set {p,}, i = 1,..,k gives a basis for R* and x, — x* must be A-conjugate to all of the {p,}, hence 
must be zero. Thus, the process must give the exact answer after at most k iterations. We now write out the 
conjugate gradient iteration directly: 


CONJUGATE GRADIENTS: Given x,, compute r, = 6 — Ax,. Set the initial descent direction p, to be r;. 
Forks 2.55 os «<5 30: 


Move to the minimum in direction p, and evaluate the new residual 


a rit i 
a= 


pi Ap, 
Xivy = X, + ap; (9) 
Vist = b — Ax;.; 
= IT; — a,Ap, 


Compute a new descent direction A-orthogonal to all of its predecessors 


ac ries Ap, 
= ee.” a 

Pi Pi (10) 
Pist = Tiss + Bip, 


The initial step (9) in the iteration is the step in the steepest descent direction p,, as discussed above. Step 
(10), the choice of a new descent direction, is simply a single step of Gram-Schmidt orthogonalization, to 
remove from the steepest descent direction its component in the direction Ap,. 
Hence 
Piss" Ap, = 0 (11) 
Futher the choice of a, implies directly that 
Kies p, = 0 (12) 


The following identities are derived easily from (9) and (10): 


ft 
rep. = irs (13) 
r Ap, = pi Api. 


The most important algebraic consequences of (9) and (10) yield only to a lengthy inductive argument (not 
given here—see Reid [12] or Hestenes and Stiefel [13] for example) 


Nia; =0 
} forallj <i (14) 
Piss" Ap, = 0 
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The identities in(14) imply that all of the {p,} are A-conjugate, even though we perform an explicit orthogon- 
alization only to one descent direction in (10). 

In the conjugate gradient iteration observe that A enters only in forming a matrix-vector product, Ap,; 
there are no transformations done which could destroy the sparseness of A. The dominant cost per step is 
usually that of forming the product Ap,; in this case the conjugate gradient iteration is very little more 
expensive per iteration than the steepest descent algorithm and it offers the guarantee of convergence 
within k steps. 

The practical reason for using the conjugate gradient algorithm is that we often obtain satisfactory accu- 
racy after only a very few iterations. Certain theoretical bounds for the convergence of the algorithm depend 
on the condition number, x, of A, defined by: 


Amax (A) 


Amin (A) 


x= 


where A,,,., (A) is the largest eigenvalue of A and A,,,, (A) is the smallest eigenvalue of A (Note: A is positive 
definite, so x is positive, and at least as great as one). The following boynds can be found in the literature 
(see Daniel [14], for example): 


E@)_(_ve-1_ yp“ 
la-Fi<t— | = 
a _ (= ) 


1 2 (é-1) 


i-J/< 
E(x) <4 neh 8 E(x,) 


ee 
1 + x 


Clearly, convergence is rapid when x is close to one (and takes place in one step if x equals one). For well- 
conditioned problems very few iterations will suffice to obtain highly accurate solutions. 


2. Preconditioned Conjugate Gradients and Matrix-Splittings. 


Our basic problem is to solve 
Ax = b, (16) 


which is the large, sparse linear system which arises from the discretization of an elliptic partial differential 
equation. We can assume that the matrix A is symmetric and positive definite, so the method of conjugate 
gradients certainly can be used. However, the condition number, x (A), is usually very large for these prob- 
lems; the conjugate gradient algorithm converges very slowly when applied directly to A and is not competi- 
tive with other methods. 

The preconditioned conjugate gradients method arises, like SOR, from matrix splittings. Consider 


4=M-N 


where M is another positive-definite, symmetric matrix, but one for which we can easily solve linear systems. 
There exists a Symmetric positive-defihite matrix M-'/? such that 


M- = (M“/2)(M-”2) 
or 


[(M-"/?) (M?)-"" = M 
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(This is the symmetric matrix whose eigenvectors are the same as M’s and whose corresponding eigenvalues 
are the reciprocals of the square roots of the eigenvalues of M. We shall not use this form, however; we need 
only the formal existence of M-'/?.) We can solve the equations Ax = b by computing 


d = M*"2b; (17) 
and solving 
Cz=d (18) 


where C = M-'/?4M~'’?, by the conjugate gradient algorithm (Note that C is a positive-definite, symmetric 
matrix), finally computing 


oa Mog (19) 


This transformation from Ax = b to Cy = d will be an effective method if: 

a) the condition number, x (C), is close to 1 so that the conjugate gradient algorithm converges rapidly, 
and 

b) the multiplication C-y for any vector y can be computed efficiently and sparsely. 

We shall examine condition (a) first. Note that 


A=M-N 
implies 
C = M“24M"/2 = | —(M'2NM"/2) = J R, 
Hence 
_ Meas F=R) 
A 


This condition number, x (C), will be close to | if A,,.. (R) is small, which is true if all of the elements of R are 
small. (The condition number is | exactly when R is O, or when M = A). We want, then, to choose M so that 
M represents A as well as possible (makes R as small as possible), and such that the choice for M allows for 
condition (b). We discuss a specific choice for M in section 3, the discussion of the model problem. Other 
examples are given in Concus, Golub, and O’ Leary [3], O’ Leary [4], and Meijerink and van der Vorst [9]. 

We now turn to condition (b); with a formal derivation of the actual algorithm used as ‘‘Preconditioned 
conjugate gradients,’ we show that condition(b) is met whenever M is such that the linear equation 


Mz=w 


can be solved efficiently and sparsely. Reconsider the conjugate gradient iteration to solve Cz = d, given an 
initial guess z;: 


(CG.): set7, = d—Cz,; 


set Dp, = 71; 
fori = 2,3,4... 
aii 
Bi Cp, 
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Zier = 2; + aip; 


= T; —a,Cp; 


a 

+ 

2 
| 


g, = tetttin 
= 


rT, 


Piss = Tin + Bip; 

In the above, the residual vectors {7} and descent directions {p,} have bars on them to denote that they 
pertain to the scaled problem Cz = d. Now view (CG.) from the original space in which we solve Ax = b. 
Define 

x = M'"z, 

r; = 6 — Ax, 
Then 

m= d—Cs, = M-*"), 

Similarily, we can define formally 

Pi = M-'"?p, 
Then 


2 Tels ~ re (as (M-'/2) r; c ro (M" r) 
“ piCp,  pi(M")* CM" 3p, — ptAp, 


Updating z, corresponds to taking 
Kier = x ta, Mp, = x, + ap; 
The corrected residual is 
Tier = M'7,,, 

= M"’? (7, —a,Cp,) 

= M'7, — a, AM"p;, 

= r,—a, A pi. 
The Gram-Schmidt coefficient 8, can be computed as: 


B * Tio’ Tiss sik Ties” (M“ Fist) 


TT, rt (M* ri). 
The new direction becomes: 


Pins = M-25,.. = MY XFias + Bip) 
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= M2 (M72 7...) + Bip, 
= Mr + Bipi. 


The result of these manipulations is that we can now write a conjugate-gradient like iteration to solve the 
equations Ax = b, but whose convergence rate depends on 


C = M'?AM’?, not A. 
PRECONDITIONED CONJUGATE-GRADIENTS (PCG): Given initial guess x,: compute 
y= Ax, 


y= b—Ax, = b-»v, 


solve Mu, = r, 
and set p, = uy, 
Fort = 2:3.4... 


compute v, = Ap,;; 
niu — riM"r, 
piv, pi Ap, 
update x,,, = x, +a,p; 

Ties = 1 — av, (= 7, — a, Ap,) 


set a, 


solve Mu,., = Tiss 


x Tey 
Tier Uier 7 Vion M Tist 
ru, 77M" r, 


set p, 
finally, Piss = Yin + Bip 

We introduced new vectors {u,} and {v,} in the algorithm above to emphasize the fact that the matrices A 
and M appear only one time during each iteration, and in very specific ways. The matrix A appears only in 
the formation of the product Ap, (or Ax,), an operation which can be done very efficiently for most represen- 
tations of sparse matrices. The matrix M appears only implicitly; we must be able to solve the linear systems 
Mu = r efficiently and in little storage, and this is the primary restriction on M. Note that the matrix 
square-roots, M'’?, do not appear at all. The purpose of the matrix M is to scale or pre-condition the problem 


so that convergence takes place very quickly. The cost, of course, is that each iteration now requires the solu- 
tion of a linear equation as well as the formation of a matrix-vector product. 


3. A model problem—the pressure equation. 


The problem to be discussed here arises in a model of buoyant convection (Rehm and Baum [6)). Specifi- 
cally, at each time step in the solution of a mixed hyperbolic-elliptic system, we are required to solve: 


y (ZajVP en) = flay) (20) 


ona rectangle R subject to the following condition on the exterior normal derivative 
oP 
an = = (xy) - ax y) on the boundary 3R. 
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Here @ denotes gas density and P the unknown pressure. We assume that 9 depends on both spatial vari- 
ables. Equation (1) then describes a nonseparable elliptic equation. Since we must solve this equation 
repeatedly, speed is paramount. Were this a separable equation, the discrete linear system discussed below 
could be solved directly by the cyclic reduction routines of Sweet and Schwarztrauber [10]. Were the density 
and its first and second derivatives known analytically, the problem could be converted to an ordinary 
Poisson equation by the techniques of Concus and Golub [2]. Neither of these conditions can be met, which 
forces us to consider the discrete linear system in more detail. 

In the context of the buoyant convection problem, we are given the density g, and g and f, only on a dis- 
crete set of points; the solution P will be produced on the same grid of discrete points. A grid suitable for the 
hydrodynamic calculations is: 


* 
(2,2) 


* » 
(2,1) (3,1) 


Ax/2 Ax 


The functions g and fare given on the interior points; g is given at the circles on the boundary. Note that the 
interior grid is offset by one half grid spacing from the boundary. 
The derivatives in(20) are replaced by second-order accurate centered finite differences. For example, 


F) alate 
Ox (zen Ox P(xy) ) 
becomes 


l | l 
Se Pte + bs) Ptesn] - 
Ax eae Ax 


l l 
——_—— ie (Plas) Pee As 9)) 
ox-*,y) Ee | 


Note that we require the reciprocal of the density at an intermediate point an O (Ax) approximation will suf- 


fice to give second order accuracy for the operator. For consistency with the hydrodynamic equations in the 
buoyant convection model, we use the reciprocal of the average density: 
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Define Q,, = @ ((i—1/2) Ax, (j—-1/2) Ay), and similarly for P,,, f,, and g,. Then the general equation for an 
interior grid point, j not adjacent to the boundary is 


fee)” 


d, d, 
—2 —2 1 2 P, 
Qiy-1 + Qiy Qiy + Qives Ay 
et el etl 
d, d, 


(21,;) 


The adjustment for points adjacent to the boundary uses the second order centered approximation to the 
boundary derivative: on boundary & (left = 1, right = 2, lower = 3, upper = 4), the terms d, are evaluated 
using the first interior mesh point and an image point (one-half grid spacing outside the region). Formally 
the terms d, are evaluated at i = 1/2, j at the left boundary for example. The discretized form of the 
Neuman boundary conditions become 


(Gp) * Cy aey + Fave) = ea/Ax= ai) (22, 
(Ge) * (yt + daPmens) = Bnerras/be= whi) (22,) 
a ra . 5 bP. b: = 47.0) = Bi1/2/ Ay = gi) (22) 
GS * (-F-dp.. + = > depines) = Binss/2/Ay = gdi) (22.) 


376 


: as al coll | 


PI 


z “ 
ah = 


pal 


vidi 4d HOU) get rouge STALE WRT imnones sat nae ppmanee auld sebaalls totem ste a: 
ayets ey, } t = gas 5 ae sowoh = = ity f . i Me ist) 4 aabaved co arviiariteb ae rbd Wy , 


yeteylewe , £ yp ee , 
giant rviges As sbintwo gaiooye hing Ya: Joan) titiog gam we hme: niteg thupe roirseni fault +2 oo. 
see Pu CNIGH \ hoxd vf paanail wt sg et eaahawed | sot oth ig « aid Set porerbews wa perdi ont 

- hs al eae . | towed ape iabeped wamrmel 
($3) Wage sya Sis aad eh & 
{cox ey 

a ae 2 i my ry 
(98) 6a seus = (on aa ae wu,’ 
vi i = : a hut ’ = lap mm q ei - i f 


For a point adjacent to boundary k, subtract equation (22,) from equation (21, ,) which eliminates the d, 
terms, and with them, all references to points i, j outside the grid. The right hand side is replaced by 


fii — Br 


Corner points are treated by subtracting (22,) and (22,) from(21,,), where the corner is adjacent to both its 
k-th and f£-th boundary. 

We arrange the grid points in the order 
{(1,1), (2,1), . . . {m,1), (1,2), (2,2), ... , (m,2),..., (Ln), ..., (m,n)} and write the equation for each point in 
this order. The result is the matrix equation 


ae 


= 
ab 


1 T3 
a Fe 


where 7, is an m x m symmetric tridiagonal matrix, and O, is an m x m diagonal matrix. We shall denote this 
matrix equation by 


Ax =b : (23) 


where 6 represents the right hand side, f, of the pressure equation, adjusted by the boundary data, g. 
This is the linear algebraic system we need to solve. We make several observations: 
1. A is symmetric 
2. A is sparse. Only five diagonals contain non-zero elements. 
3. The eigenvalues of A are all real and non-positive; A is a negative semi-definite matrix, generally of 
rank mn — |. Hence —4 is a positive semi-definite matrix. 


Although we could apply the method of conjugate gradients directly to —A, convergence would be very 
slow. O’ Leary [4] and Meijerink and Van der Vorst [9] suggest several different approaches for creating scal- 
ing matrices M to use a preconditioned conjugate gradient scheme. One general approach is to consider 
that A originated from a differential equation and let M be the discrete operator from a separable differen- 
tial equation which approximates the pressure equation. 

A specific choice is suggested by Concus and Golub [2]: Let L be the discretization of the Poisson equation 


VP=f 
with Neumann boundary conditions (on the same staggered grid). Then 
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We assume that we have available a good subroutine for solving the equation 
Lu=v, 
e.g., subroutine BLKTRI from the NCAR package of subroutines for elliptic partial differential equations 


{10}. 
Let D‘’? be the diagonal matrix with entries 


dx =V Ox i 


the square root of the ratio of corresponding main diagonal entries of A and L. Then 
M=D“Lp'2 


is a symmetric negative semi-definite sparse matrix whose main diagonal is identical to that of A. Hence, in 
the splitting 


A=M-N, 


the matrix NV has zero main diagonal and non-zero entries on at most four off-diagonals. 
In the preconditioned conjugate gradient iteration we must be able to solve 


Mz=r 


This is done without excessive storage demands by: 
a) compute w = D-'/? r (D is a diagonal matrix) 
b) solve Lu = w by cyclic reduction (BLKTRI) 
c) compute z = D"'/? uy, 


For computational convenience, to avoid the repeated multiplications by the diagonal matrix D'/? or its 
inverse, we replace the original problem 


Ax=b 

by Az= 5b 

where d= D2? 4p"? 
b= D'?5 


solve Ax = 6 by preconditioned conjugate gradients with L as the scaling matrix, and finally compute 
x= D2, 


This is the approach actually used in the computations reported in section 5. 


4. The Preconditioned Conjugate Gradient Algorithm for a 
Semi-definite Matrix. 


In our discussion of the discrete operators A and L in the previous section we have ignored one character- 
istic of these operators which renders them unsuitable for direct use in a conjugate gradient or precondi- 
tioned conjugate gradient iteration. The conjugate gradient algorithm requires that A be a positive-definite 
matrix; otherwise the function E may have zeroes other than at the solution of the linear equation. The 
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algorithm becomes transparently a maximization algorithm for negative-definite matrices. (There is no need 
to change signs implicitly or explicitly to solve linear systems involving negative definite matrices, e.g., the 
discrete Laplacian with Dirichlet boundary conditions). However, our operators A and L are both negative 
semidefinite; they have negative eigenvalues, and also, one zero eigenvalue each. 

In this section we will extend the theory of the conjugate gradient algorithm to allow for the special case 
of a semi-definite matrix with known nullspace. We shall then extend the preconditioned conjugate gradient 
algorithm to allow both A and M to be of this special type. The first part of the section will be elementary for 
readers well-versed in the conjugate gradient theory; the second part contains some new and unexpected 
results. 

The problem is simple: A is a singular matrix. This is shown easily by noting that whenever a term d, 
appears on the diagonal of A, the opposite term —d, appears on one of the off-diagonals. Hence, the sum of 
the coefficients in any row of A is exactly zero. But this is the same as saying 


Ae=0 


where e is the vector(1,1,1,...1). In general, 4x = 0 <=>x = ae, where a is a real scalar. 

The singularity of A is related to the Neumann boundary conditions for the pressure equation. If P is a 
solution to the differential equation so is P + c, where c is any constant. Similarly, if x is any solution to (3), 
so is x + ae, where a is any scalar. But this is equivalent to adding the constant a to each point x, of the 
solution. The singularity of the operators also implies that not every system of equations has a solution. A 
system of equations is consistent if and only if it has a solution. The characterization of consistency for these 
operators is simple to express: The pressure equation is consistent < = > 


SSa f= Sone (25) 


The linear equations Ax = 5 are consistent <=> 


b7e = 0, that is, when b in Range(A). But 
bre = 0 <=> Fb, =0 


<=> Efy = Se.) + edi) + Fest) + adi)) 


the discrete analog of the integral equality (25) 
Further, even when the equation is consistent, the solution is not unique. There is a unique solution of 
shortest length in the usual L, norm. For the continuous case it is the solution with mean pressure zero, i.e., 


fJ,P=0. 


In the discrete case the analog is 


again, the mean (discrete) pressure is zero. 
We shall now discuss the modifications to the conjugate gradient algorithm which would be necessary to 
obtain this unique solution to a consistent system of equations. For generality, assume that we want to solve 


Ax =b (26) 


where A is symmetric positive semi-definite of dimension & with known nullspace the span of {n}. Hence, the 
rank of A is one less than its dimension. Further, assume b7n = 0, so that (26) is a consistent system. 


(note: if (26) is not consistent, b= b- a Th 


satisfies the consistency condition, and any solution of Ax = b 


is a least squares solution of (26)). 
Let the vectors {n,q2qxq« - - - qx} form an orthonormal basis for R*. Then Range (A) = span 
{ququ--- Qh 
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Let Q be the orthogonal matrix (Q7 = Q-') 
Q=[n: q2 gp ot Gh 
Then 
Ax = b <=> (QA Q)(Q'x) = Gb (27) 


But, by the consistency condition b7n = 0 


Q7b = | 0 


a,-, 


Similarily 


Hence, (27) really takes the form 


- (28) 


Clearly w is arbitrary in(28). The equation holds whenever 
Bz=d (29) 


a (k—1) — dimensional problem. All solutions of (26) are given by 


where z solves (29). The unique minimal length solution is given by 


“= of 34 


By the assumptions on A, the matrix B is positive-definite and the conjugate gradient iteration can be 
used to solve (29). The condition number 
— Acs(B) _ (A) 
Sa es tA 
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(where we assume the eigenvalues are ordered algebraically) governs the convergence of the algorithm. 

We now show that we can solve the consistent linear system (26) directly by the conjugate gradient algo 
rithm, with convergence rate determined by x (B). The proof is elementary: we show that the algorithm, 
when started with an initial vector x, and a right hand side 6 which lie in the same invariant subspace of A, 
solves the problem while remaining entirely in that subspace. 

Suppose we have the equation 


Ax=b 


with b’n = 0 and x,’n = 0. Then the conjugate gradient algorithm, when applied to A, produces vectors x, 
r, and p; exactly equal to 


and 


where z,, r, and p, are the vectors produced by the conjugate gradient algorithm for 
Bz=d 
with initial vector z, = the last kK—] entries of Q’x,. We note that 


0 0 0 
Q*(b- Ax) = Ob - Q°A(QQ ns = t . ras | ; = | 


where it is essential that b7n = 0. It follows that 


om 6 


So assume the required properties hold for x,, r; and p,. We shall show the iteration preserves these prop- 
erties for x,.:, ri+1, and p;,,. The first step in the iteration is to compute 


Qn, 


Fat 


pi Ap; ; 


a, = 


which by induction is 


(ong Med anrrsawEG neiigvatt oct Or 


since Q7Q = J and 


Then 


That B, = B, follows from exactly the same reasoning that showed equality for the numerators of a, and @,. It 


is equally easy to show that 
‘ = + B <=> Q'p = ae -~-| = 0 P 
Pis+i Ties iPi i+l a +B, Biss 


Thus the conjugate gradient iteration will succeed in solving a consistent system. 

We should note that the condition that xjn = 0 is required only to assure that the solution also satisfies 
this property. If the minimal length solution is not desired, this condition can be ignored. The condition that 
b’n = 0, that Ax = b be consistent, is essential. If the system is not consistent, the initial residual r,, and all 
successive residuals, will have the same component in the direction of the nullspace. Suppose that b (and r, 
and p,) has a componenty -n of the nullspace. The recursion for {r,} 


Tin = Mi — a,Ap, 


shows that r,,, has a componenty-n of the nullspace for alli This, of course, also follows from the property 
that r;,, is a residual vector for Ax, and b. The directions {p;}, and the approximate solutions {x;}, have com- 
ponents of n which increase with i (in fact, rapidly). 

It suffices to examine the recursion for p, 


Piss = Tin + Bipi 


where B, > 0. If d,; is the component of n in the vector p,, 


Oi = (y + B.d,) — (y - B; (y + Bi-s ; d;-1)) —— rc 


Thus the direction’s component of the nullspace increases. At the same time, p;,, is shorter than r,,, in the A 
norm, so the relative component in the direction of the nullspace grows. In the limit, the directions may 
become nearly parallel (they are still A-orthogonal, but A does not induce a metric). We can see this also by 
looking at the limiting case when a iteration is started with an actual solution vector. The residual vector 
(which is not a descent direction) and the initial direction, are in the nullspace—the initial step a, is infinite. 

The last example given above implies that the convergence rate for obtaining the least squares solution to 
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an inconsistent system no longer depends on x (B). The convergence obtained in practice may be much 
slower than the bound obtainable for a consistent system. In practice, we are interested only in minimizing 
the residual, which we can ensure by working with a consistent system. In actual finite precision computing, 
the residuals and directions may wander slightly into the nullspace, in which case we are in the same situa- 
tion as if we started with a slightly inconsistent system. The effects will be negligible unless the rate of con- 
vergence is very slow. In any case, the effects may be suppressed by explictly reorthogonalizing the direc- 
tions p, to the nullspace. 
In the preconditioned conjugate gradient algorithm we replace the system 


Ax=b 
by 
Cz=d 
where 
C = VAP, 


V some positive definite symmetric matrix. In the special case of a semi-definite A, a natural choice for a 
scaling matrix may also be semi-definite. This is the case for the two operators discussed in section 3. We 
have two possible cases to consider: V positive definite and V semi-definite. 

The case of a positive definite V is little changed from the ordinary semi-definite conjugate gradient 
algorithm. We take M = (V*)"', or more directly, V = M~'’?. The rank of the matrix C is k—] and its 
nullspace is 


span {M'/?n,} 
where n, denotes a non-zero vector in the nullspace of A. The convergence condition for C required that 7; 
and p, be orthogonal to M'/?n,. 


The former condition will hold if Ax = 6 is consistent since then 


Ey = M"*/7(b — Ax;) 


and 
(b —Ax'n, = 0 
The latter condition is equivalent to 
pi i Mn, 
since 
pi = Mp, 


But clearly p; = M~' r, is orthogonal to M ny, since r,7n, = 0. By induction, if p,7Mn, = 0, 


Pisi’ Mn, = (rot) Mn, + B; (p;7 Mn,) = a eba = 0. 


Thus, convergence is governed by the pseudo-condition number AAC) the residual vectors r; remain those 


AC)’ 


of a consistent system, and the coefficients a, and B, are determined by an iteration remaining in a subspace 
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of the range of C. However, the scaled directions p, are allowed to venture into the nullspace of A, and 
hence, the solution vector x is no longer necessarily the minimal length solution. This may be repaired easily 
at the end of the iteration by computing 


x™ns 


natn, 


Ra 


The other natural case is one in which the scaling matrix M is also positive semi-definite, with the null- 
space generated by a known vector ny. The linear operator corresponding to M™ in the definite case will be 
M’, the pseudo-inverse of M. (For our purposes, we will need only a method for solving consistent systems 
Mz = w. Our knowledge of the nullspace of M then enables us to compute the minimal length least squares 
solution, M*u, for any right hand side uw). Formally, the matrix V appearing in the definition of C is (M* )'/?, 
so 


C = (M*)"/2,4(M*)*/2 - (6) 


Several conditions which must be met by M are immediate. Since nullspace (C) D nullspace (M), the rank of 
M must be at least the rank of A. For our specific problem rank (M) must be at least k—1. Otherwise C has 
rank at most k—2 and cannot possibly span the entire k—1 dimensional space of possible solutions to Ax = 
b. To guarantee that C has rank k—1, we must have rank (M) > k — | and also that the nullspace of M is not 
orthogonal to the nullspace of A. If the latter condition fails, we know that ny, lies in the range of M (since it 
is orthogonal to the orthogonal complement of the range of M). Hence, by symmetry of M, nz, lies in the 
range of (M*) '/*. There exists a vector z such that(M*) '/? z = n, andz 1 ny. We would have: 


Cry = (M*)'/2 4 (M*)'?2nw) = (M*)!/2 A0=0 


Cz = (M*)'/24 ((M*)'2z) = (M*)'”? An, = (M*)'20 = 0 


Thus, the dimension of the nullspace of C would be at least two. 

Recall now the motivation for the preconditioned algorithm. The rate of convergence of the conjugate 
gradient algorithm is determined by the condition number, x{A), which is large for most discrete elliptic 
operators. To accelerate the coverage of the conjugate gradient algorithm, we replaced A by the precondi- 
tioned operator C = M-'/? AM~'/?, where we presume that x(C) = 1. Our presumption can only be true if 
x(A) = x{M), i.e., the preconditioning matrix must have roughly the same poor conditioning as has A. This 
requirement follows from the inequality 


(A) x(M"") 
™(C) > max {en > dA) } 


where we note that x(M~' ) = x{M). A brief proof of this inequality is given below: 


{C) = Umax (C) / Amin (C). 


However, the eigenvalues of C are the same as the eigenvalues of M-' A. We can bound the eigenvalues of 
this latter product as: 


je (M* A) 2 Umax (Mm ) ay es (A), 
Amin (M™* A) < Aynin (M™ ) * Amaz (A). 
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It follows that 


Amin) Ama(M) 
ME) ee Tay” Tt) 


= x(M) / x(A). 
The other inequality above follows by a similar argument. 
We can obtain suitable preconditioning matrices for most positive definite discrete elliptic operators only 
by using poorly conditioned matrices. The required poor conditioning of M may be even worse in the case 
where both A and M are semi-definite. The preconditioning breaks down if 


Tha xe =) 0 


We shall now show that the preconditioning nearly breaks down, in the sense that M must be very ill-condi- 
tioned, if the nullspaces are almost orthogonal. Assume that n4 and ny have length one, and that 


na'Ny = d, 
which is much less than one. We can write A and M* in their respective eigendecompositions 
A= UD, U’, 
M = VD, V", 
where the zero eigenvalue of A and of M* is given first. Then 
C = MM"? 4M"? = SS’, 
where S is defined by 
S = VDy.'" V'U D,'”. 


For each of the matrices S, M*, A, and X, define the condition number of the matrix as the condition 
number of the matrix restricted to the orthogonal complement of its nullspace. Then 


(S) = x{(Dus'/? XX) (Da"’?)). 


In the above equation, the matrix X is the (k—1) by (k—1) matrix obtained by removing the first row and col 
umn from VU, that is, 


We now note that the inequality 


x(BC) > max x(B) aot 


x(C)’ x(B) 


holds for unsymmetric matrices B and C. (The proof follows from the triangle inequality for matrix norms, 


|| Bl|<||BC| |||.) 
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When we extend this inequality to products of three matrices we obtain the relatively weak result that 


x(B) x(C) xD) _ 
(BCD) > max WM C)dD) ’ {Bd D) ’ {Bd C) 


In the context of our preconditioned operator, only x(X) is unknown. However, Alan Cline [11] has shown 
that when d<<l, 


(X)=1/6. 


Thus, the condition number of the preconditioned operator C is bounded below by: 


x{ C) 2 (1 / 5) / (x{ A) x(M)). 


Since x{A) is fixed, this implies that x(M) must be large whenever d is small. 


V. Numerical Results 


A code, written in FORTRAN, was developed to implement the preconditioned conjugate gradients 
scheme discussed in previous sections. Care was taken in the preparation of the code to make it portable and 
to introduce many comments for clarity. The code has been run successfully under a variety of conditions 
and has been compared with analytical results to determine its accuracy. In this section a description is 
given of some of the computations used to determine the performance of this code. 

The model problem, the pressure equation, was discussed in section III. Special cases of this general non- 
separable elliptic equation were used to test the code for accuracy. All production runs of this code were per- 
formed when the code was imbedded in a larger linear or nonlinear fluid dynamics computation; timing 
studies were performed in such an environment. 

Within the code, two tests are used to terminate the conjugate gradient iteration; these depend upon two 
specified parameters, the maximum number of iterations (less than or equal to 50) and a maximum relative 
residual, ¢. (The relative residual is defined as the norm of the residual divided by the norm of the right 
hand side in the scaled problem discussed in sections 2 and 4.) In all successful computations to date, the 
iteration is terminated after a relatively small number of iterations by the relative residual norm satisfying 
the criterion that it be belowe. 

To test the code under the simplest conditions, a Poisson equation on the unit square was discretized and 
solved with homogeneous Neumann conditions applied at the boundary. In continuous form this problem is 


se re a = f (uy) = cos(xn x) cos(fn y) 


on0 <x <1 and0<y< 1. The boundary conditions are 


ae l, 
Ox 

Op _ a z 
ay = Oaty = Oandy= 1. 


The solution to this problem is 


l 
P(uy)=- hy ster cos (xn x) cos(fn y) 
When this problem is discretized to second order accuracy on the “‘staggered grid’’ (a grid displaced one 
half incremental unit dx in the x-direction and one half incremental unit dy in the y-direction) as discussed 
in section III, it can be written 
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1 wa if Pesees 2 x be = 
Ge (Pisty — 2py + pi-ry) + Spr Pees — 2p + Diy-1) = fy 


Where 
Fi, = cos (= (i— 14)] cos [= (j- 14) 
for 


1<i<mand1]l<j<n. 


Here dx is the mesh spacing in the x-direction, dx = 1/m(m is the number of mesh cells in the x-direction) 
and dy is the mesh spacing in the y-direction, dy = 1/n (n is the number of cells in the y-direction). The 
boundary conditions are 


Poy =Py and Pmsiy = Pm for l<j<n 
Pio =Pir and Pins = Din for l<i<m 


The solution to this linear algebraic system is 


By = - ————— 00s (i-14)] cos (A j-¥0) 


(2m sin) + (2n sin) 
for0 <i<m+1,0<j<n+l 


The solution to this simple discretized Poisson equation was computed using the code, and the solution 
compared with the analytical solution given above. 

A small test problem, m = n = 7, was used to determine the accuracy obtainable whene = 10° Com- 
parison with the exact solution demonstrated that the components of the solution vector obtained from the 
computation agreed to at least six significant figures with the exact solution. Generally the agreement was 
much better, being seven or eight significant figures for most values of i and j. (Note that the UNIVAC 1108, 
on which all computations were run, carries about eight significant figures.) 

As a larger test problem, the equations with m = n = 31 were solved withe = 10~ For this computation 
agreement was obtained to a few parts in the sixth significant figure. 

A second test problem, a discretized approximation to a separable elliptic equation, was also solved ana- 
lytically and using the code. Comparison of these results indicated that the accuracy was similar to that 
obtained in the first test problem. 

The code was then imbedded in a linear finite difference computation obtained from a second-order dis- 
cretization of a set of equations arising in fluid dynamics. These equations describe two-dimensional inter- 
nal gravity waves in a stratified ambient fluid within a rectangular enclosure. The interest in this problem is 
discussed, the continuous and discrete equations are presented and exact analytical solution to the con- 
tinuous and discrete problem are given by Baum and Rehm [8]. Additional linear computations using this 
code on a somewhat more general fluid-flow problem are described in Rehm and Baum [7]. In this latter 
paper the more general linear finite difference equations are given, and the computational procedure for 
solving them is presented. The manner in which the preconditioned conjugate gradients code is used in solv- 
ing the elliptic pressure equation is discussed. In all of these linear computations, the equation for the 
pressure is separable; it is only in the general, nonlinear computations that the equation for the pressure is 
nonseparable. 

For the linear computations reported in these papers, the number of iterations required to obtain a rela- 
tive residual error less thane = 10-5 or 10~° generally varied between 2 and 4. A large number of such com- 
putations have been run. 

A representative one is a calculation for which m = 15, n = 16 ande = 10°; in this computation the 
pressure-solver was called 200 times. Two iterations to convergence were taken ten of the first eleven times it 
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was called, each call using about 0.5 s of CPU time on the NBS Univac 1108. Thereafter, each subsequent 
call required only one iteration to convergence taking about 0.35 s of CPU. An indication of the rate of 
covergence of the algorithm is obtained by taking 


ne logio (Initial relative residual) — log, (final relative residual) 


number of iterations 


For the computation reported above, this rate of convergence was about 4.5 when two iterations were taken 
and about 5.5 when one iteration was taken. 

It should be noted that these computations determine a flow field which evolves with time. Except for the 
first time step, the pressure vector at the previous time step is used as the initial guess for the pressure vec- 
tor each time the elliptic-solver is called. Hence the guess at each calling is quite good and convergence is 
rapid. 

Computations of a set of nonlinear finite difference equations generalizing the linear ones described 
above have also been run. As noted previously, in this case the elliptic equation for the pressure is nonsepa- 
rable. In a limited number of computations, the PCG code has been found to perform well in this case also. 
A representative computation, one for which / = 31, J = 31 ande = 10°, was found to take between 2 and 
5 iterations for convergence at each call. The CPU time taken was slightly over 2 s for 2 iterations and slight- 
ly under 5 s for 5 iterations (very roughly a second per iteration generally). In this calculation d defined 
above was found to vary between about one and two. 

In the Appendix to [15] a listing of the elliptic-solver, called FASTSL, is given. To use this code, subrou- 
tines from EISPACK, the Argonne Code Center package, and BLKTRI, a subroutine in the NCAR package 
developed by Schwarztrauber and Sweet([10] are required. 
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APPENDIX 


DIAGRAM OF SUBROUTINE LINKAGES 
BRIEF DESCRIPTION OF SUBROUTINES 


LISTING OF ALL SUBROUTINES 
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Brief Description 


INTNEU - 


FASTSL - 


HCGSOL - 


Initializes the NCAR Routine BLKTRI to solve 
v é = f on the rectangular grid (with Neumann 
Boundary Conditions) 

Generates parameter vector PPARM which must be 
passed back to BLKTRI when we actually solve 


the pressure équation 


Separates required working storage into pieces 


for the coordinating subroutine HCGSOL 


= Coordinates the pieces 


a) Calls GENPOI to generate the discrete five 
point operator for the pressure equation. 
The matrix's representation is stored in 
the vector ' APARM! 

b) Calls NEUSCL to compute a diagonal scaling 
to make the discrete pressure equation have 
the same diagonal as the es equation. 


ce) Calls HCGSIN with the diagonally scaled 


operator. HCGSIN uses preconditioned conjugate 


gradient algorithm with 


A 


scaled pressure equation 


F 


discrete Poisson equation 
d) Calls NEUBAK to undo the diagonal scaling to 
get a solution of the original pressure 


equation 


HCGSIN - 


ATFSHL - 


NEUSOL - 


HCGSOL also coordinates diagnostic information 


Piccanstiicned congugate gradients repeated by 
calls to subroutine to compute . 
Ax for some x 
and a subroutine to solve Py = x 
for some x 
The names of the subroutines are parameters from 


HCGSOL. Here they are 


ATFSHL and NEUSOL 


Unpacks representation of scaled pressure equation 
in APARM to pass it to ATFIVE, which actually per- 


forms the matrix-vector product. 


Unpacks representation of POISSON equation in PPARM, 
to pass it.to BLKTRI to actually use cyclic re- 


duction to solve the discrete Poisson equation. 


Other Subroutines 


PROJCT, PROJCE - To compute the projection of an arbitrary vector 


on the orthogonal complement of some vector 


(PROJCT) or the vector of all one's (PROJCE) 


To compute the inner product of two vectors 
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BIROUT -- 


VECOUT - 


Prints out a symmetric block matrix with five 


non-zero diagonals used only for diagnostic 


output. 


‘Prints a vector in a grid-like form. Used only 


for diagnostic output. 


Subroutines from external libraries 


TQL1 - 


BLKTRI - 


From EISPACK. Computes all eigenvalues of tri- 
diagonal matrix, (produced implicitly by conjugate 
gradients); used to estimate convergence rate of 


conjugate gradients 


NCAR routine. Used to solve ve P = f with Neumann 


boundary conditions. 
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